The Minkowski central partition as a pointer to a suitable distance exponent and consensus partitioning
نویسندگان
چکیده
The Minkowski weighted K-means (MWK-means) is a recently developed clustering algorithm capable of computing feature weights. The cluster-specific weights in MWK-means follow the intuitive idea that a feature with low variance should have a greater weight than a feature with high variance. The final clustering found by this algorithm depends on the selection of the Minkowski distance exponent. This paper explores the possibility of using the central Minkowski partition in the ensemble of all Minkowski partitions for selecting an optimal value of the Minkowski exponent. The central Minkowski partition appears to be also a good consensus partition. Furthermore, we discovered some striking correlation results between the Minkowski profile, defined as a mapping of the Minkowski exponent values into the average similarity values of the optimal Minkowski partitions, and the Adjusted Rand Index vectors resulting from the comparison of the obtained partitions to the ground truth. Our findings were confirmed by a series of computational experiments involving synthetic Gaussian clusters and real-world data. © 2017 Elsevier Ltd. All rights reserved. d G s a ∅ t i
منابع مشابه
Dynamical distance as a semi-metric on nuclear conguration space
In this paper, we introduce the concept of dynamical distance on a nuclear conguration space. We partition the nuclear conguration space into disjoint classes. This classification coincides with the classical partitioning of molecular systems via the concept of conjugacy of dynamical systems. It gives a quantitative criterion to distinguish dierent molecular structures.
متن کاملAn employee transporting problem
An employee transporting problem is described and a set partitioning model is developed. An investigation of the model leads to a knapsack problem as a surrogate problem. Finding a partition corresponding to the knapsack problem provides a solution to the problem. An exact algorithm is proposed to obtain a partition (subset-vehicle combination) corresponding to the knapsack solution. It require...
متن کاملA Distance-Based Packing Method for High Dimensional Data
Minkowski-sum cost model indicates that balanced data partitioning is not beneficial for high dimensional data. Thus we study several unbalanced partitioning methods and propose cost models for them based on Minkowski-sum cost model. Our cost models indicate that the distance to one of both ends of data space dominates the expected value under uniform data distribution. We generalize studied me...
متن کاملDevelopment and Application of Aqueous Two-Phase Partition for the Recovery and Separation of Recombinant Phenylalanine Dehydrogenase
Aqueous two-phase systems (ATPS) have emerged as a powerful extraction method for the downstream processing of bio-molecules. The aim of this work was to investigate the possibility of utilizing ATPS for the separation of recombinant Bacillus sphaericus phenylalanine dehydrogenase (PheDH). Polyethylene glycol (PEG) and ammonium sulfate systems were selected for our experi...
متن کاملA-Wardpβ: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation
In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 67 شماره
صفحات -
تاریخ انتشار 2017